Mustango is a novel multimodal large language model specifically designed for controllable music generation, combining Latent Diffusion Model (LDM), Flan-T5, and music features to achieve high-quality text-to-music generation.
Text-to-Audio
Transformers